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(54) Process for selecting candidate drug compounds 



(57) The invention relates to a process for drug can- 
didate identification, said process comprising the steps 
of: 

(1) obtaining a computerised representation of the 
three-dimensional structure of a binding site on the 
surface of a biological macromolecule; 

(2) generating a computerised model of the func- 
tional structure of said binding site which may be 
used to identify favourable and unfavourable inter- 
actions between the binding site and a drug candi- 
date molecule; 

(3) identifying a molecular fragment (or "template" 
T) capable of placement within said binding site and 
capable of carrying at least one (preferably a plu- 
rality (ie. at least two) and especially preferably at 
least 3) substituent group, said molecular fragment 
either being capable of being synthesized from re- 
agent compounds accessible in substituted form 
whereby to import said substituent groups on syn- 
thesis of said molecular fragment or being present 
in an accessible reagent compound capable of sub- 
stitution with said substituent groups by reaction 
with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent 
compounds (eg. a 1 -A, a 2 -A, a 3 -A, etc, b-,-B, b 2 -B, 
b 3 -B, etc, c n -C, c 2 -C, c 3 -C, etc), the lists being such 
that a combination of compounds taken from each 
list (eg. a-,-A, b 3 -Bandc 11 -C)may be reacted to pro- 
duce a candidate compound comprising said mo- 

BEST AVAILABLE COPV^ 



lecular fragment carrying a plurality of substituent 
groups (eg. a-, b 3 c^ T) thereby generating a first vir- 
tual library of candidate compounds being the the- 
oretical set of compounds producible by reaction of 
the members of said lists (ie. a l b 1 c 1 T, a 1 b 1 c 2 T, 
a 1 b 2 c 1 T etc), each member of each list comprising 
a component (eg. A,B,C, etc.) common to the other 
members of that list and a component (eg. a v b v 
c v etc) unique within that list; 

(5) for each said list limiting the number of members 
thereof using a first set of exclusion rules thereby 
to generate a restricted second virtual library of can- 
didate compounds, the operation of said first set of 
rules involving for each member of each list com- 
puterised comparison for favourable or unfavoura- 
ble interactions between said computerised model 
and a structure comprising said molecular fragment 
and a substituent deriving from the unique compo- 
nent within said list of that member, the molecular 
fragment and the computerized model being held in 
fixed spatial relationship to each other for said com- 
parison; 

(6) evaluating and ranking by computer the mem- 
bers of said second virtual library for favourable and 
unfavourable interactions with said computerised 
model and thereby generating a restricted third vir- 
tual library of candidate compounds ranked as hav- 
ing favourable interactions; 

(7) optionally, selecting from said third virtual library 
at least one further molecular fragment and repeat- 
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ing steps (4), (5) and (6) to generate an alternative 
third virtual library; 

(8) screening said third virtual library using a second 
set of exclusion rules thereby to generate a restrict- 
ed fourth virtual library of candidate compounds 
comprising compounds which are candidates for 
synthesis and experimental evaluation for drug ef- 
ficacy; 

(9) synthesizing some or all candidate compounds 
of said fourth virtual library to produce a candidate 
compound library; 

(10) experimentally evaluating the compounds of 
said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data gen- 
erated in step (10) for structure-activity relationship 
information; 

(12) using the information derived in step (11) se- 
lecting a revised set of lists of accessible reagent 
compounds, said lists being expanded to include 
selected reagents not present in the restricted lists 
generated in step (5) and optionally restricted to ex- 
clude selected reagents present in the restricted 



lists generated in step (5); 

(13) repeating steps (6) and (7) to identify further 
compounds which are candidates for synthesis and 
experimental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating 
said further compounds for drug efficacy; • 

(15) if required repeating steps (11) to (14) one or 
more times; 

(16) identifying as a lead candidate a compound 
synthesized and experimentally evaluated as 
above. 

The process of the invention is characterised by the 
rapid generation of a relatively small set of readily syn- 
thesisable candidate compounds with a high success 
rate in terms of drug efficacy and hence a high predictive 
value for directing subsequent iterations. 
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Description 

FIELD OF THE INVENTION 

5 This invention relates to a process lor selecting lead candidate drug compounds, and in particular to such a process 

in which synthesis of candidate compounds is simplified and minimized and success rate with synthesized compounds 
is maximized. 

BACKGROUND OF THE INVENTION 

10 

Drug discovery has been a time and resource consuming exercise. Traditionally, key steps in drug discovery have 
included the identification of a compound or set of compounds having the desired drug property, the identification of 
the active structure within such compounds and the identification of a lead candidate, a compound which incorporates 
that structure and combines adequate activity with acceptable toxicity and synthetic accessibility. By acceptable syn- 

*5 thetic accessibility it is meant that the lead compound should be produceable via a synthetic route which is sufficiently 
straightforward and inexpensive that commercial production of the compound is a viable option. 

The identification of active compounds has involved screening of extensive compound libraries for the desired 
drug property. Recently, the technique known as combinatorial chemistry has offered a moderate cost route to the 
synthesis of very large compound libraries which can be screened in this manner. Although it is now increasingly being 

20 applied to the synthesis of libraries of non-peptide organic molecules, the combinatorial chemistry technique is espe- 
cially applicable to the production of libraries of peptide and peptoid compounds, and synthesis and testing of such 
compound libraries can even be automated and operated under computer control. Thus for example an alternative 
approach to drug discovery using computer-controlled combinatorial chemistry is described by 3-Dimensional Phar- 
maceuticals Inc. in WO-A-96/08781. 

25 Unfortunately, however, the peptide and peptoid compounds for which such a combinatorial chemistry approach 

is particularly suited, due to the ease with which peptide molecules can be produced with a multiplicity of sequences 
on automated peptide synthesizers, often display undesirable pharmacokinetics, such as poor bioavailability. 

An alternative approach to drug discovery has also developed over recent years. This approach referred to vari- 
ously as Structure-Based Drug Design (SBDD) or Computer Aided Molecular Design (CAMD) involves structural anal- 

30 ysis of the receptor site for the drug molecule and can involve computerized generation of a molecular structure which 
is capable of binding to that site, ie. a structure which has an appropriate structural framework to fit within the receptor 
site and which is so f unctionalized as to have favourable interactions with selected functional components of the re- 
ceptor site. 

One example of the SBDD system is the PROJJGAND system of Proteus Molecular Design Limited. This is 
35 described for example by Clark et al in a series of papers J. Comput. -Aided Mol. Design 9: 13-32 (1995), 9: 139-148 
(1995), 9: 213-225 (1995) and 9: 381-395 (1995), J. Med. Chem 37: 3994-4002 (1994), and J. Chem. Inf. Comput. 
Sci. 35: 914-923(1995). 

While highly effective, SBDD serves to generate and assess molecular structures on the basis of predicted activity 
without particular regard to synthetic accessibility. These molecules must then be made and tested and subsequent 
40 optimization to produce a lead candidate may require time consuming, complicated or expensive chemical syntheses. 

It has now been recognised that by combining certain of the features of combinatorial chemistry with certain features 
of SBDD one can produce a drug discovery system in which only a relatively limited compound library need be generated 
before a range of active compounds is identified, that that range of active compounds may provide sufficient structure- 
activity relationship information for a lead candidate to be identified with relatively little iteration (ie. relatively little 
45 extension of the library that is initially generated and tested), and that the library may be generated on rational principles 
ensuring that the vast majority of compounds in the library may be synthetically readily accessible. 

In other words, in using the process of the invention to generate the structure activity information necessary to 
identify a lead candidate one may avoid the need to make and test the large compound libraries required by prior art 
routine screening or by combinational chemistry and, unlike prior art SBDD techniques, the active compounds identified 
50 will implicitly be synthetically readily accessible. 

Thus viewed from one aspect the invention provides a process for drug candidate identification, said process 
comprising the steps of: 

(1) obtaining a computerised representation of the three-dimensional structure of a binding site on the surface of 
55 a biological macromolecule; 

(2) generating a computerised model of the functional structure of said binding site which may be used to identify 
favourable and unfavourable interactions between the binding site and a drug candidate molecule; 
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(3) identifying a molecular fragment (or "template" T) capable of placement within said binding site and capable 
of carrying at least one (preferably a plurality (ie at least two) and especially preferably at least 3) substituent 
group, said molecular fragment either being capable of being synthesized from reagent compounds accessible in 
substituted form whereby to import said substituent groups on synthesis of said molecular fragment or being present 
in an accessible reagent compound capable of substitution with said substituent groups by reaction with further 
accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds (eg. a r A, a 2 -A, a 3 -A, etc, b r B, b 2 -B : b 3 -B, etc, c-,- 
C, c 2 -C, c 3 -C, etc), the lists being such that a combination of compounds taken from each list (eg. a^A, b 3 -B and 
c in -C) may be reacted to produce a candidate compound comprising said molecular fragment carrying a plurality 
of substituent groups (eg. a-, b 3 c^T) thereby generating a first virtual library of candidate compounds being the 
theoretical set of compounds producible by reaction of the members of said lists (ie. a 1 b 1 c 1 T > a 1 b l c 2 T, a 1 b 2 c 1 T 
etc), each member of each list comprising a component (eg. A,B,C, etc.) common to the other members of that 
list and a component (eg. a v b v c l5 etc) unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised model and a structure comprising said molecular fragment and a substituent deriving from the unique 
component within said list of that member, the molecular fragment and the computerized model being held in fixed 
spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfavourable 
interactions with said computerised model and thereby generating a restricted third virtual library ol candidate 
compounds ranked as having favourable interactions; 

(7) optionally, selecting from said third virtual library at least one further molecular fragment and repeating steps 
(4), (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted fourth 
virtual library of candidate compounds comprising compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate compound 
library; 

(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

(12) using the information derived in step (11) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or more times: 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above. 

Viewed from an alternative aspect the invention provides a method of manufacturing a drug substance, said method 
comprising the steps of: 

(1) obtaining a computerised representation of the three-dimensional structure of a binding site on the surface of 
a biological macromolecule; 
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(2) generating a computerised model ol the functional structure of said binding site which may be used to identify 
favourable and unfavourable interactions between the binding site and a drug candidate molecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at least 
one substituent group, said molecular fragment either being capable of being synthesized from reagent compounds 
accessible in substituted form whereby to import said substituent groups on synthesis of said molecular fragment 
or being present in an accessible reagent compound capable of substitution with said substituent groups by reaction 
with further accessible reagent compounds; .. 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of compounds 
taken from each list may be reacted to produce a candidate compound comprising said molecular fragment carrying 
a plurality of substituent groups thereby generating a first virtual library of candidate compounds be.ng the theo- 
retical set of compounds producible by reaction of the members of said lists, each member of each list comprising 
a component common to the other members of that list and a component unique within that list; 

(5) tor each said list limiting the number of members thereof using a first set of exclusion rules thereby, to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised model and a structure comprising said molecular fragment and a substituent deriving from the unique 
component within said list of that member, the molecular fragment and the computerised model be.ng held in fixed 
spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfavourable 
interactions with said computerised model and thereby generating a restricted third virtual library ol candidate 
compounds ranked as having favourable interactions; 

(7) optionally, selecting from said third virtual library at least one further molecular fragment and repeating steps 
(4), (5) and (6) to generate an alternative third virtual library; 

(B) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted fourth 
virtual library of candidate compounds comprising compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said lourth virtual library to produce a candidate compound 
library; 

(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

(12) using the information derived in step (11) selecting a revised set of lists of accessible reagent compounds 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or more times; 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above; 

(17) manufacturing the compound identified in step (16) above; and, optionally, 

(1 8) admixing the compound manufactured in step (17) above with at least one pharmaceutical^ acceptable carrier 
or excipient. 
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products of such compounds as well as salts, esters, and other feasible chemical transformations. If desired, the mem- 
bers of the resulting lists may be grouped into groups deemed likely to have similar properties in any resulting candidate 
compounds and also ranked in terms of accessibility (ie. expense, delivery delay, complexity of any required transfor- 

Thelimiting of the lists of reagent compounds in step (5) may conveniently, as discussed above, comprise restric- 
tions of groups of compounds deemed analogous, elimination of low accessibility compounds, elimination of com- 
pounds with too high a molecular weight, too large a total atom count or with substituent groups thought to possess 
undesired properties (eg. charge or reactivity with other groups such that template production may be hindered)- 

The computational comparison used to further limit the lists may be effected on a list by list basis w.th for each list 
the selected members of the other lists remaining constant or preferably with the other substituent positions on the 
template being vacant. Where the comparison does not involve such vacant sites these 'selected members may be 
chosen on the basis of perceived compatibility with the binding site but conven.ently once the first list (preferably the 
shortest) has been evaluated, a highly compatible member of that list will be the invariant selected member for eval- 
uation of the next list and so on. Advantageously, once highly compatible members of the other lists have been identified, 
the first list will be reevaluated with highly compatible members of the other lists being the invariant selected members. 

Alternatively and much more preferably, the computational comparison may be significantly simplified by prese- 
lection of one (or more if necessary) invariant locations and conformations for the template within the binding site 
model followed by comparison, on a list by list basis for individual members of the lists and for an incompletely sub- 
stituted template, eg. the template carrying only the substituent(s) deriving from the individual list member being in- 
vestigated In this way, alternative orientations ol a list member which satisfies basic requirements such as appropriate 
size and functionality (eg. charge, lypophilicity, hydrogen bond donor/acceptor, etc.), may be scrutinised to improve 
the predictive value of the ranking which is the result of the comparison. 

For step (1 ) of the process of the invention, one can conveniently input 3-D structural informat.on about the binding 
site (eg X-ray crystallographic analyses) from published sources, preferably sources which are computer-accessible. 

The binding site model generated in step (2) conveniently consists of a representation of those regions of the 
binding site that can be considered capable of molecular interaction with a xenobiotic or other molecule, labelled ac- 
cording to the nature and geometry of the possible interactions, eg. hydrogen bond donor sites, hydrogen bond acceptor 
sites, aliphatic and aromatic lipophilic sites, ionic and metal-binding sites, etc. 

30 DETAILED DESCRIPTION OF THE INVENTION 

Briefly put, the process of the invention involves the following steps: 

- Construction of a virtual combinatorial library based around a template chemistry considered appropriate for the 
35 target molecule and amenable to combinatorial synthesis 

- Screening of members of the library based on their interaction with a target receptor 

- Synthesis and testing of representative elements of the library as single compounds using a variety of synthetic 
40 protocols. 

With a known target molecule structure, this process can be done efficiently and accurately and overcomes various 
difficulties associated with applying combinatorial chemistry or Structure-Based Drug Design. 

Firstly the process offers all the advantages of an array based combinatorial library (single compounds wider 
45 variety of chemistries) whilst sidestepping the problem of small library arrays. This is because a very large v.rtuaHibrary 
is considered and screened computationally, leaving only a small number of compounds to be synthes.sed and tested 

Secondly the need to have one synthetic protocol to cover a wide variety of chemistries is relaxed. The synthesis 
route can be tailored to accommodate a larger range of chemistries than could be considered by an automated method. 
Solution and solid phase methods can be used with protection and deprotection steps as requ.red. This means that a 
so larger virtual library can be considered and thus the chance of locating active compounds is increased^ The process 
also allows for simple functional group transformations within the starting materials to increase further the diversity of 

ThildirbyTestricting the design process to molecules which are accessible by specified synthetic routes^one 
avoids the problems often associated with rational drug design, ie. uncertain synthetic feasibility and slow feedback 
55 between design and experiment. The process yields a set of compounds based around a common template which can 
be rapidly synthesised and assayed for activity against a given target. Such a set of compounds might form an .^me- 
diate QSAP, (Quantitative Structure Activity Relationship) training set, in contrast to other drug discovery paradigms 
where further work would be necessary to derive an equivalent QSAFS set A primary advantage over a traditional 
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medicinal chemistry approach, which would involve obtaining a lead by a screening process and then synthesisinq a 
large number of analogues to provide an SAR set. is thus one of cost-effectiveness synthes.s.ng 

suJmufnts^ached to H ThT^' .T ° f M ,ibra,y C ° nSiS ' S °* 3 COmmon ,em P la,e wHh Cerent 
5 h ! substituents are derived from accessible chemical reagents and it is variation in these 

s subst,tuents at each template attachment point which causes the combinatorial explosion in the numberonndivS 
mo.ecu.es ,n the library. The library has a synthesis route or strategy (sometimes referred to hereTas theTerSate 

SZES aS T' a,ed " Wh6reby indiVidUa ' memberS are s ^esised from available chemical reagents Te tem 
plate itse f can be an available chemical or can be formed during the chemical reactions (e.g. in ring forming reaSonS 

The technological aspects of the process may be separated into four stages. The design specification staoe de 

obru e rr h cons,ra,n ! s which are ,o be app,,ed and expiored ^ ,ne c ° m p*-*— — ■£ 

Obviously, these constraints include the actual specification of the library, the 3D structure of the receptor and anv 
specie constraints derived from the receptor to judge the quality of libJy members. The secZ Zge ZoZ sl 
ect.ng chemical reagents and screening the corresponding substituents which are used to form members oHhe virtual 
hbrary. The substituent screening is based on the structure of the receptor.,The accepted ^SSSTm^ 
assessed and f.tered using a variety of computer-aided techniques and chemical consLatk>ns TtemrT^TZ 

l2Z e rZZT ° f T/T ' ibrary ' Le Pr0dUC,i ° n °' ,he ,U " af,er a " substituents have beLn 

deleted. The f.na. computational stage of the procedure is to perform simple checks and calculations on the enumerated 
l-brany and arnve at a ranking of the molecules in the virtual library for synthesis and testing enumerated 
Each of these four stages will now be outlined in detail. 

anH Il 6 H hree 3SP f CtS 0f . de ! i9n s P ecifica1ion which are discussed below are template selection, template positioning 
and the design cntena which will be discussed separately despite being inter-related posit.on.ng, 

^ a TTH! Cat> r ° f u 6 d6Si9n Cri,efia inVO,VeS Care,U ' Study of the ,af 9 el macromolecule. Thus, decisions need to be 
wheL !T T ab ° U, t : hiCh X " ray structure(s) of the receptor are to be used (i, more than one is avai.abte) and 
whether some refinement by molecular dynamics/molecular mechanics needs to be carried out in order.to gene ate a 
more accurate starting pent for molecular design. Typically more than one snapshot of the receptor structure wN be 

r h whir.h e e ss,v h r er ; men,s H Also ii is necessary at this stage ,o decide ° n the ^ * t^r.s 

tani^^ h , f ° n Cand ' date Com P° unds are to ^eract A 'design model' is then generated for each 
a, f hmem P T 69 USi " 9 1he DesiQn Model Generation functionality of PRO_LIGAND (see Clark e, a. 
(supra)). A design model consists of a number of interaction sites which originate from specified receptor atoms and 

or^nts 6 TZT™ ( T° tin9 , f r OUrab,e POSiti ° nS and direc,ions for *** interactions She S site) 

or points (denoting posrt.ons of favourable lipophilic contact with the active site) (see Bohm J Comput -Aided MoT 

theTr,- V T T Kl6be ' J MO ' Bi ° L 212 (1 " 4 » The vectors and Po-ts ar TlaSd^ndiclte 
the particular chemistry they represent; thus D-X and A-Y vectors represent potential hydrogen bond donS and I ac 
ceptor | posmons respective*. Similarly, L and R sites represent aliphatic and aromatic lipophilic sites respectively The 
densrty posrtions and orientations of the interaction sites are encoded in a rule-base which can be edited by the user 
and is based on a statical examination of experimental preferred intermodular contacts (see K.ebe (supran 

liooohilfc P Zt^ e th t e h m °' eCU,ar ,If mpla,e is 10 ho,d in P° sition ,he substituents which will make hydrogen bonds 
°ther favourable interactions with the binding site. An advantage of using structural fnformation 
n the choice of the template chem.stry is that knowledge of the receptor can be used to increase the chances of the 

chemTy m °' eCUleS A * ^ Can 66 ident,fied in the 

' ^222 TTZ aSS0Cia,ed wiln ,ne ,em P ,ate should be relatively accessible and capable of delivering a 
wide diversity of substituents at a number of attachment points. 9 

* iiH a " y ' mT^ I! 56 " Sh ° Uld be Capab ' e ° f making 3 number of 'curable contacts with the receptor This 
modules 9 P bn ° f temP ' ate inCfeaSeS ,he ,ikelih00d that ,he librarv wi " ^ta'n active 

" ^ZZtt^ SS ^ l ° in,er JL kely temP ' a1eS ,r ° m kn ° Wn inhibil ° rs ° r substrates. For example, in the search 
ZTcTllT ! ^ T ' b,t ° r ° f mr ° mbin iS PPACK (D-Pn^^ny'-Prolylarginyl-chloromethylketone) 
r . k Pf m ° ,e,y WhiCh COU ' d 66 US6d 35 8 ,em P' a,e ' or one cotJ,d ch °°se a known sub- 
to T"1T Tf- !? 9 i (S 9 9Uanidinium in ,he 81 P° cke t of thrombin, which can be pre-positioned and used 
to search for potential templates (ie. using the "anchor- technique referred to above). 

- Templates can be designed de novo, using structure-based techniques. This could mean using a de novo design 
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method or a receptor-based database screening strategy. It is also advantageous to search reaction databases 
for suitable ring forming reactions which, for example, would give rise to beta-sheet m.metics 

- It is desirable that the template has restricted conformational freedom so that only limited numbers of alternative 
positions for the template need be considered. 

The process of template selection may thus involve Cose collaboration between modellers and synthetic ^ 
the former providing expertise about the requirements of the templates in terms of molecular l^ 8 ^^ *J"2 
she and the latter giving guidance concerning the synthetic feasibility of any choices made. The result of the template 
sefeSn process is a set of scaffolds chosen to achieve the best architecture in the active site and to —e the 
synthetic effort required to prepare them. In practice, the decision about which templates to pursue w,ll be a balance 
between the variety of factors discussed above. hi1hort ^ » ,c nnt 

It should be emphasized that unlike many of the combinatorial chemistry techn.ques .described hitherto, it is not 
necessary that the template or the candidate compounds be peptides or peptoids. tol ™, atoc 
Having chosen the set of lemplates to be used in the active site of interest, the next task ,s to posit.on the empla tes 
appropriately within the site. In principle, there will be a very large number of orientations of a ^ X ™**^*~ 
ste (although this number can be reduced if the chosen template makes a specific interaction with the binding s, e 
itself) What is required is to select a subset of these positions which place the template in such a way as to facilitate 
the molecular interactions that will be formed by the substituents once they are attached. 

This placement process could be achieved automatically by means of various objective docking protocols based 
on molecular mechanics or empirically based energy calculations (see Blaney Perspect. 

or geometric positioning upon interaction sites (see Bohm, J. Comput.-A.ded Mol. Des. 8:623 (1994 ). The result of 
template positioning is a position, or number of positions, in 3D coordinate space for each of the templates. The chosen 

orientations are saved for future reference. 

The process of substituent selection involves and/or other chemical compound databases a number of steps 

- Searching the Available Chemicals Directory (ACD) of MDL Information Systems Inc., San Leandro, California, 
US (and/or other chemical compound databases) to find potential substituents for a given template 

- Computationally screening these potential substituents, eg. using techniques such as those used in the de novo 
design program, PRO_LIGAND (see Clark et al. (supra)) 

- Assessing and deciding on the preferred substituents at each position. 

35 Each of these steps is explained more fully below. It is important to realize that substituents attached to different 

attachment points are preferably tested independently of each other at this stage. This makes the process of petaf« 
detailed 3D checks on a large virtual library computationally efficient. This and other approximations .nherent in this 

^SnTpSticSfte^ate, it is possible to inferior each template attachment point the nature of the interaction 
40 Is) the corresponding substituent is to make with the active site (eg. hydrogen bond, l.poph.lic contact, etc.), the nature 
. if\he (unctiona. group required for a coup.ing reaction to the template (eg. acid chloride with *9™»V™£ and a 
- distance range between the point of attachment to the template and the point of interaction with the active site 

These two (or more) substructural criteria with the associated distance range(s) const.tute a viable query for a 3D 
database search us,ng database searching tools, such as ISIS/3D from MDL Information Syste me jlnoj Jntytom 
as Tripos Associates Inc., and Chem-3D from Chemical Design Ltd., etc. The query can be ^Zac^sTa nts on 
through a consideration of potential molecular transformations, or through the imposition of synthetic cons uantt i on 
allowed chemistries in specSied substructures. By using the ACD, the chance that a.l chosen 

mercially available is maximised. In general, the search carried out should explore the conformational flexibilty -of the 
■ database molecules to ensure that as many as possible of the potential substituents at each position w. I be retneved 
For each template attachment point, a file of potential substituents may be saved as 2D structures to a He, eg n 
MDL's SD format (see Dalby, J. Chem. In. Comput. Sci. 32: 244 (1992)) and then the Converter program ^able 
from MSI, San Deigo, California, US) may be used to add the necessary hydrogen atoms and generate 3D coord.nates 

The^methods used for the computational screening of potential substituents may conveniently be techniques such 
55 as those used in the de novo design package PRO_LIGAND (see Clark et al (supra)). As described earlier eatf, 
template attachment site has its own design model and the template attachment sites themselves are appended to 
the design models, according to the labels specified in the template file which is inpul to the program. By automatcally 
labelling the potential substituents for each template attachment position with appropriate interact.on link sites, rt is 
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F or morJd^ ^ ??°? h ™ ,0 Whether Can form 90od molecular interactions with the active site 

For more deta.ls, see Clark, J. Comput.-Aided Mol. Des. 9:13 (1995) and Murray J. Comput.-Aided Mo. Del 9 381 

withlnn, f h!r ibili,y °V!? approach is Chanced by the ability to detect specified functional groups and replace them 
the chanc J, T f ' nCreaSeS diVerSi,y h the Viftual tha « ■» computationally acL^XSE^ 



of how this feature might be used. 

ChPm h !nT 0 r eCUla Vo nS, oL ma,i0n 66 controlled b V "** containing a SMILES-like notation (see Weininqer J j 
Chem. Inf. Comput. Sc. 28:31 (1988)) for the substructures together with a number of integers Thus fo Sn, ' 

mo£^ ,S rebU,H a, ° m by U ^ a - P-edure and then relaxed with I 

Molecular weight 
Number of atoms 

^ - Log P (eg. calculated using the method of Viswanadhan, J. Chem. Inf. Comput. Sci. 29:163 (1989)) 
Number of rotatable bonds 

exam! fT! T!f aCCe P table ™* ^ be automatically rejected This is useful for 

Tun^T^ ' S ° ,ten 3 Coun,erion - T he code can ateo screen out duplicates 

if in t ? nrt ' al $Creen ° n subst " uents "«y be employed for some complex template chemisties Thus for examole 
if m a ring form.ng react.on one chemical reagent gives rise to two substituents on the temp a ^henTh ™ !' 

chect n , 9 h ,e :H P,a,e atta K hment P ° intS Wi " h3Ve ,he Same ,ist ° f avai,ab,e chemSl « ^th hem^pec 

reiec Ld oT T P ^ SUbsti,Uent Cannot be assi 9ned either interaction sites or link sites it is automatically 

Ztein?e ra ^ 
of U ln^nn 
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calculated using the d.rected tweak routines which seek to establish the maximum and minimum di stances ; that _can 
be attSSd Seen a., pairs of atoms through rotation about rotatab.e bonds (see Murray, J. 

9 381 (1995)). The subgraph isomorphism algorithm then uses these distance ranges ,n establishing a match .n the 
manner described by Clark, J. Mol Graphics 10:194 (1992). liable 
If no match is found for the substituent. it is rejected and the algor.thm returns to consider the next ava.lable 

SUbS rhe e fling of a match for a substituent in the subgraph isomorphism check is not necessarily a «£^»f^ 
for a subsh.uent to be accepted. This is because the distance bounds matrix does not include correlat on effects ^ 
he effect S« interatomic distance having one value might have on the possible values attainable by the other 
mera"rmic dlances Thus, in order to establish whether the substttuent is in fact a viable one for the template attach 
menTpSun Question, a specific matching conformation should be generated using some form of conformational 
exploration procedure (see Clark, J. Mol. Graphics 10:194 (1992)). 

The procedure adopted for the experimental trials reported below is based on the d.rected tweak algonthm ^see 
HuJ J iZm Z Comput. Sci. 34:190 (1994)) which was originally developed for 3D database searching apphca- 
tons whJreThas been shown toTe both efficient and effective. Its utility in the field of de novo design has recently 
K^^n Homnnc;t rated (see Murray J Com put -Aided Mol. Des. 9:381 (1995)). 

TSTSSSSiTaSLi. takes th'e match establ.shed by the subgraph isomorphism algorithm and then seeks 
to verify n by performing a torsional optimisation of the rotatab.e bonds in the substituent. After a potential match has 
been looted 'the substituent is attached to the template. The bond where attachment occurs is treated as rotatab.e. 
20 The following cost function is minimised by a steepest descent method: 
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where the summation occurs over all N interaction sites, and d t is the distance between the ,th substituent nte^ct.on 
""e and the design model interaction site with which * is matched. a f is a coefficient which depends upon the .type of 
fnSraSon site being matched and is a simple function of the tolerances used in the subgraph ,somorph,sm algorithm 
so TZoSuZTmTsL^sX used by Hurst (J. Chem. Inf Comput. Sci. 34:190 (1994)) and Murray (supra) inthat 
me dSnce between pairs of sites are not included, only the absolute distance between the two matched s.tes^ Th s 
because th template attachment site provides a fixed point of reference in the design mode, coordinate 
Lpace TOs means thal tn^e are fewer terms in the cost function expression and it is likely that the simpler expression 
has fewer TocTmTnima. There is also no need to check the chira.ity of the conformations produced. These advantages 

" ^tfS ^^^So, is accepted ,f it passes the fo.lowing criteria. The va.ue of the cost function 
• musfbe e~n a user defined maximum (typica.ly about 0.5 A*), and the substituent must not be Cashing with the 
™e P ?or with th template or with itself. If the conformation fails these checks, the tweak routines are used to ,1 nd an 
Sattv^om^tl - the procedure is repeated until an acceptable geometry is located or a user definable number 

40 ^sub^ opt-ally minimised using a molecular mechanics energy 

function tiTZe in the presence of the receptor (which is treated as rigid) and a cut-off on the lone . range i terms 
of Sta usually applied. An estimate of the strain energy in the receptor-bound conformation ,s obtained by Pe-fon™9 
°h fmSS ^starting from the tweak-generated geometry) in the absence of the receptor and subtracting rom this 
^ZZ*I^*utar energy of the receptor-bound conformation. During these calculations, the template part of 
ZZ^te mechanics calculations may be done emp.oying the fast and app™te 
^eSS^ewivelied by Hahn (J. Med Chem 38:2080 (1 995)). Partial charges are calculated using he method 
SJrt Marsi (see Tetrahedron 36:3219 (1980)). The Clean forcefield bears many scarifies to the gen- 
e raised atom r forcef.eld incorporated in the-"chem-X software (available from Chemical Design Ltd, Chipping Norton^ 

^ cafclting the energy of a system (see Hahn (supra)). A number of minor adjustments may be made in the imp.e- 
men^or ofthe forced. TtJfirs, is that all hydrogen atoms are treated specifically and are assignee ar , sp^atom 
woe The second is that van der Waals" radii for potential hydrogen-bond-formmg atom pairs are scaled, typically by 
0 8 I, ^houT be realised that the purpose of the Clean forcefield in the process of the invention ,s to provide a rough 
clean up of the substituents, which may possess distorted geometries caused by unrealistic torsion ^ang.es_ The force 
Seld must be robust, in the sense that it must be able to cope with any chemistnes that are given to it and th* ^ s why 
a Generalised atom forcefield is the most obvious choice. Additionally it must meet the approximate accuracy critena, 
and r h the accuracy of the intermo.ecu.ar terms is important. It was after analysis of intermodular ge- 
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(here the template) is held fix^T3cep,or 9e ° me,neS ' * When S ° me P ° r,ion of the ™ ,eci " e 

^CS2S^'Sii , 2; 2STciS^r hfld to the temp,a,e) is ,hen assi9ned a score a 

243 (1 994)) Bohm's scorinc IncZ ™SJ - 90 Pr ° 9ram LUD ' (S6e J ' Compu1 " Aided Mo1 Des. 8 

and Ui^^^Si^Z^S^ r alCU lS i0n °' ^ bindinQ ener9y °' the Subs, «- n < 



AG binding - AG 0 +AG hb 2 hbonds T (AR Aa) + AG^Zj^AR, Aa) 
+AG Npol\,c| + *<*J* M 

15 where 

f(AR,Aa) = f1(AR)f2(Aa) 

20 and 
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fl(AR) - / 1 - (AR-0.2)/0.4 1r1°0 2 6A 
( 0 AR > 0.6A 



and 



1 



f2(Aa) = ( 1 - (Aa-30)/50 Aa ~~< 



( 0 



Aa < 30° 
90' 
Aa > 8 0° 



.^C^ whose geometry deviates from ideality. AR is the deviation of 

turbed ionic interaction AG, denotes thP rnntrih, ,ti™ 7 r ionic the contnbut, on from an unper- 
«o the lipophilic contact tr£c^ 

due to the freezing of internal dearees nf iIZkT .V V ' rot descrlbes th e loss of binding energy 

excluding the ^^TS^Z^^^^C " ""^ °* ^ 

- -4 7 h L Va ' Ue - sTaG JS" H°f iCiGn,S ,h ° Se ad ° Pted by '° r the LUDI P^ram: AG 0 = 5 4 AG hh 

.orYga^cVtor b fnd^ 

ometries were obtained b SS^S^^SZ TZZEZZZZ * *" ~ 

better than 1 .5 orders of magnitude in the binding affinity. * ' S n0t eXpeC,6d '° be 

theyar S£ ZZ2X££ <° the strength of interaction 

for the temptete-substituent combSon " 9 ** f>re - C ° mputed sco ' e fo ' •» template from the tota. score 

tion on a c v e e D b r n , e rt ed ' ,he subs,i,uent 9eome ^ ^ ^ S^n^^-^r conforma " 

m^SSSSS^SSSr T "J ,6rnPla,e — — ^ -Traumatica,, enu- 
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The output thus far for each template attachment point is a directory of substituent files each conta.n.ng scoring 
and database information. A directory of substituents. the receptor structure and the template molecule are read in to 
a graphical visualisation package. This package may be designed to allow the user to scroll qu,ckly through the sub- 
strtuent list in any order whilst displaying the file name and any molecular properties that are present ,n the substrtuent 
s files (e g the strain energy, the Bohm score or the components of the score). The propert.es may be d^ 'n a 
spreadsheet running alongside the molecular visualisation. Substituents can be visualised ,n isolation, wrth the template 

or with the receptor structure. 

A set of substituent structures is treated as a list on which operations can be performed by the user For instance 
one would probably want to store all structures with Bohm scores less than a given value in a new i.st of Good Scores , 

10 one might a.so want to exclude all structures with high strain energies, and possibly remove bad structures judged by 
more subjective crileria (e.g. bad chemistries or geometries). The user can have full control over which list of st uctures 
are displayed. At any time the user can write a list to a new or old directory or remove a I.st from an old directory 

Coupled to the list functionality is a clustering facility which allows one to cluster a specrf .ed I.st on the basis of 2D 
chemical functionality. The clustering may be based on similar functionality available in PROJJGAND wh.ch measures 

is similarity by Tanimoto coefficients derived from bit string representations of the chemical structures (see W.llett. J. 
Chem Inf Comput. Sci. 26:109 (1986) and Barnard, J. Chem. Inf.' Comput. Sci. 32:644 (1992)). The bit strings may 
be specified by 172 atom-centred fragments generated from an analysis of 5000 structures in the Cambridge Structural 
Database (see Allen, J: Chem. Inf. Comput. Sci. 31:187 (1991)). Several different clustering algorithms are available, 
and one may use a hierarchical clustering method such as Complete Linkage or Wards. (The number of structures 

20 clustered may typically be about 100 or less, so CPU time is not an issue.) A number of tools are available to he p 
decide on the appropriate number of clusters for the specified lists. The output from the clustering is a new set of lists 
each containing an individual cluster. These can be browsed and operated on as described above. Whilst the clustering 
is not always perfectly in line with chemical intuition, it is an extremely useful way of navigating through and keeping 
track of a fairly large number of substituents. 

25 The final facility provided by the molecular browser is to rescore a list of substituents using the emp.ncal Bohm 

score Rescoring in this way is practical because tens of structures can be scored per second and .s useful because 
information gained during the scoring can be used to provide a graphical representation of the score. Hydrogen bonds 
or ionic interactions are located, marked and annotated with the contribution they make to the predicted b.nd.ng aff.n.ty. 
This saves a lot of time in deciding which hydrogen bonds are formed and how good they are It also points out hydrogen 

so bonds which may be contributing to the score in an unrealistic way. Bonds that are considered rotatable are alsc .marked 
so that the user can see which bonds are (or are not) contributing to the score. Finally, the grid used to establish the 
lipophilic contribution to the score is displayed graphically. Relevant grid points fall into several categor.es: 

lipophilic ligand atom in contact with lipophilic receptor atom 

35 

- lipophilic ligand atom in contact with polar receptor atom (or vice versa) 

. polar ligand atom in contact with polar receptor atom - lipophilic ligand atom in contact with nothing (i.e. solvent) 
40 - polar ligand atom in contact with nothing (i.e. solvent) 
volume of ligand 

The user can colour each of these grid point types, though in practice, we have tended to use colours for the first 
45 three types only. The visualisation is useful because it displays aspects of ligand-receptor contact wh.ch are often 
difficult to assess quickly from looking at the complex alone. u . . 

After application of these tools a smaller set of substituents is decided on for each of the template attachment 
points The aspects which are considered in producing this list are. 

2D diversity Using the clustering tools and chemical knowledge, a diverse set of subslituents may be chosen. For 
so example if there are 10 fluorinated derivatives of phenylalanine only one need be chosen. Exploration of different 
chemistries is important because the scoring functions can only be expected to deliver approximate accuracy in the 
prediction of the binding affinity. 

3D contacts It is important to look at the contacts a substituent is predicted to make with the receptor and to form 
a iudqement as to whether these seem reasonable or not. In particular, substituents which have a large amount of 
55 polar nonpolar contact are suspect. There should also be an awareness of 3D diversity and there should be an attempt 
to target molecules which explore different forms of receptor contact to make up for deficiencies .n the scoring criteria. 

synthetic considerations There should be a consideration of synthetic feasibility Although the strategy of making 
single compounds by the most appropriate protocol means that a larger diversity of substituents are synthetically ac- 
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Th^ Wi " f " b ! S ° me subs,ituen,s which co "tain functionalities that are difficult to incorporate in any synthetic 

TZ LTi V ' 7 kT C ° mPOUnd ' S ,0 bS ChOS6n ,r ° m Several simi,ar Possibilities, choices coZ be made 
on the basis of ease of availability or price of the compounds 

scores The scores of the substituents (e.g. Bohm scores, forcefield energies, etc.) can be used tochoose preferred 
substituents from among lists of similar compounds. euiocnoose preferred 

R orlnn " <;°" ,bina,oria, enumeration simply involves forming a list of all the remaining substituents at each 

R-group position and then creating all possible combinations of them. Thus, given a template with three R-orouD oo 
TheTeom , " ^ *" COmbinatorial enumeration procedure wil. produce 

Thegeometriesproducedarebasedonthehighestscoringgeometriesofthecorrespo^din TheSfnq 
molecules are stored for further analysis or transfer to a 3D database resulting 

. ho I™ reSUltin9 m °'! C U ' e ! Can b6> bUt 3re n0t USUa,ly ' mini ™ed with the Clean forcefield and are then rescored in 
X^ZZZ" " ^ *" are als ° roUtine * calculatJfSS^ 

In our applications, the complete molecules have also been subjected to evaluation using the CFF95 forcefield in 
Discover (ava„abie from Molecular Simulations Inc. San Diego, Ca.ifornia, US). Simplified cut down mSEJlS 
receptor are usedand minimisation and molecular dynamics are used to assess the quality of the designs" melons 

S,ab ' e dU " n9 dynamiCS P 058653 hl9h SC ° rin9 SnaDshots then ^ « ^de-d 
The final decision about which molecules to synthesise is made by considering all the data collected for the sub 
stituen s and enumerated molecules. The full library could be synthesised, or selected molecule ^an te chosen from 
he m library. The poss.bil.ty of experimental design to choose the best candidates has been explored The me Z 

of nSLT, T T t,ma ' f 6 " 9 " WNCh *° maXimiSe ,he COVera 9 e <* a ^ecfied property space > in a suoS 

ma^seT * ^ ^ ^ eXDl0ratio ^ ,he ^ in the following properties Ls approximate y 



- the substituents from which each library member was derived 
estimated value of logP for each library member 

- the hydrogen bond, the rotatable bond and lipophilic contributions to the Bohm score for each librany member 

, JST? COns,rain,s can be ^P 08 ^ on ,he design such as inclusion or exclusion of compounds which are outside 
a specified range of a molecular property. The general conclusion of this application of experimentaTdesigr^ was mat 

such as ease of synthesis of particu,ar cLses ° f 

Most of the computationally intensive routines of the operating software for the process of the invention mav be 
wnften .n Fortran, the data structure and data handling code in C, and the drivers and user interfi^™^ 

use o! SlobalT/lf ,K Pre,ed K an9Ua9e deSi9ned ,0f apP ' iCa,i0n t0 computer-aided molecular design The mSn 
use of Global .s thai, together with the chemical utilities and their associated data structure routines it provides a 

d2n fZrr n H ° Perati ° n ° f PrOC6SS °' the inVen,ion A la "9 ua 9 e which allows n g^ oVderiTcal 

aS Tebuo G^**™ T ,0 be H eXPreSS6d SUCdnCt,y and na,Ura,,y makes the me,hods «° P^ram amend 
and debug. Global also makes mundane tasks such as IO and memory management straightforward and f ees the 

ZXZ^ZT^ ? *" ChemiCa ' deSiQn 3SPeCtS ° f 3 P^™™^. Because there is no compHat on 

^ZSESEEFSf? SaSV , ? 'f' 1 me dnVerS and mn th6m inlerac ^'V or in batch mode. The user can 
either treat the GLOBAL files as mput decks .n the traditional sense or, if they have more confidence can make fairlv 

ZnSES L° ° rd r ° P r i0n ° f driVerS ' imr0dUCin9 <»™ — substrenTas h y 

fanouaoe nMhi 9 , ^ ^ ^ h CAMD a PP»cations as illustrated by TRIPOS'S SPL 

language or the various languages offered to MSI users 

orelTaTc^? 0 ^ 

^SLSS^.SS P ? °. ra reC !!" y ' naSa " y ' ,ranSderma »* b V Ejection or infusion, or into the lungs. Typical 
ST emu sTenT^d o^ ^ h ' P *' CaPSU ' eS ' SU PP ositories ' ^P*- sprays, solutions, dispersions, suspen- 
sions emuls.ons and gels. Such composrtions may contain conventional pharmaceutical^ acceptable carriers and 
exc, pi en, S , eg. water for injections, physiological saline, buffers, sweeteners', dispersan.s, b'ulk'g agems etc 

55 EXAMPLE 

The generation of a Library of thrombin inhibitors is described as an example of the present invention 
Thromb.n ,s a trypsin-like serine protease recognised as a key enzyme within the coagulation cascade/ Its primary 
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action is catalysis of the conversion of soluble fibrinogen to insoluble fibrin, which is the bas.s of thrombus ormat.on 
and blood clotting In addition, thrombin has several other roles in the control of pro- and ant.^oagu.ant pathways .n 
the coagulation cascade, inducing platelet aggregation, and more general signalling roles via act.vat.on of a thromb.n 
ecepTo?t hrombin is the final Sep in both the extrinsic and intrinsic dotting cascades, it has attracted much attention 
aTa therapeutic target. Modulation of thrombin activity may be of use to prevent inappropriate thrombus format.oa for 
example as a general anticoagulant as an adjunct to surgery or as a prophylaxis in various card.ovascular d.sorders 
such as myocardial infarction and unstable angina. Direct competitive inhibition of thrombin has been pursued by 
several pharmaceutical companies in an effort to obtain a new class of anticoagulants and ant..hrombot,cs, potent.ally 
with qood oral bioavailability and improved efficacy and toxicity when compared w.th ex.st.ng drugs 

The present example relates to the design of a library of hovel thrombin inhibitors which are potential drug cand.- 
dates At this stage the quality of the designs was assessed in terms of an in vitro assay of thromb.n mhibrtion. Suc- 
celstu. designs may be selected on the basis of the measured inhibition constant (K,). In addit.on. the se ectn/,ty of he 
compounds towards thrombin may be assessed by performing enzyme inhibition assays versus structurally-related 
serine proteases, such as trypsin and Factor Xa. In general, enzyme specificity is an important cons.deraton because 
an intended thrombin inhibitor may also inhibit fibrinolytic enzymes and hence exert an undesired thrombot.c effect^ 
The first stage in the application of the process was the identification of an appropriate template shuc ture, and an 
associated synthetic strategy. This was achieved by anafys.s of known thrombin inhibitors in order to KJenffy ^emtcal 
moieties which appear to contribute favorably to binding. The source of the data was the Brookhaven prote.n, database 
From the available thrombin-ligand complexes it was decided to select as a template the prol.ne moiety from the .nhib.tor 
PPACK (D-Dhenvlalanyl-prolyl-arginyl-chloromethylketone). 

This was chosen for several reasons. Previous analysis of SAR data for thrombin had highlighted PPACK as an 
effective inhibitor bound at the active site (that is the site at which the catalytic hydrolysis of the pept.de substrate 
occurs) The activity of PPACK was believed to be the result of making favorable mteract.ons w.th several d st net 
regions of the active she, most importantly two hydrophobic pockets (labelled as distal and proximal to the cata lyt.c 
amino-acid residues) and a polar pocket (the arginine-binding pocket or in enzyme terminology the SI pockety Prol.ne 
was considered a good choice for a template because it fulfilled the design criteria that .t should make some favorable 
interactions in itsett, and also allow the positioning of substituents which will also make favorable mteract.ons. Ana ys.s 
of me X "ay structure revealed that proline makes good interactions with the proximal hydrophob.c pocket and allows 
the positioning of a potential library of substituents which are likely to make good interactions with the rema.n.ng two 
po^Sts Tn the absence of an X-ray structure these assumptions would have to be made on the bas.s of modell.ng the 

T^e'stmcture of"pPACK and some of its key interactions with thrombin are shown in Figure f . It was the intention 
to design a library of reversible inhibitors exploring a diverse set of substituents in Ihe D-Phe and Arg P osrtlons 

Several sets of substituent lists were prepared using different design criteria. Init.ally, the N-termmus on prohne 
was targeted with starting reagents that possessed a carboxylic acid (to form a peptide bond wrth the template) k and 
a hydrogen bond donor plus a hydrophobic group (to form contacts with the D pocket). .The was later augmented by 
a lis" of sulphonic acids and sulphonyl chlorides (to form a sulphonamde bond with me template) -^^^T 
initially targeted with starting reagents that possessed bis-amines (to form a pep..de bond w th the template and I hy 
drogen bonds in the S1 pocket). This was augmented by a list of amines wrth aromat.c n.tro compounds used as 
projected' anilines; and aM in which amines were 'protected" as nitrile compounds. In all cases, there were ,X and 
3D constraints imposed when searching through the ACD. Two positionings of the template .n an ^^-ated receptor 
conformation were used. The first was derived directly from the proline position .n the crystal structure of the covalently 
bound PPACK, and the second was derived from a computational simulation of a non^ovalently bound analogue of 

PPACK . 

Table 1 gives some details of the numbers of compounds considered at each stage of the process for the second 
template position (the results from the first template position are very similar). For the sake of clarity the results ,o forty 
one substituent list at each template attachment point are given. The 2D search was not fully ref.ned a nd I ncludes 
many reagents which are not practicable with any simple synthetic route. It also includes many » u «*^«h£ 
would be ruled out of a single protocol combinatory approach yet have been successfully .ncluded .n our r.al com 
pound set It is clear that even after a thorough application of 3D database searching, the v.rtual l.brary s.ze ,s st.ll 
enormous and receptor screening and scoring are required to reduce it to a manageable number. 
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Stages 





Arg position 


After 2D ACD screen 


4262 


After 3D ACD screen 


894 


After receptor screen 


144 


After binding affinity 


65 


screen 




After strain energy screen 


53 


Selected synthesis 


9 


candidates 





No. of accepted substituents 



Phe position 



No. of compounds in^ 
Virtual Combinatorial 



Library 
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8803 
437 
145 
81 

71 

8 



37518386 
390678 
20880 
5265 

3763 
72 



10% TES in TFA (30 mini ™Jn~tar f P h8 pr0dUCt Was c,eaved ,rom ,he resin with 

ether, yielded the crude product aeprotect.on (5 /o aq TFA), evaporation and trituration with diethyl 

priaJ^S'!?' pT'T* T P,ed "* deSC * ea """"^ "»™>«-te amine B expound, alte, appro 

«™-P°a n ^^ 



BNSDOCID: <EP 0818744A2J_> 



16 



EP 0 81 8 744 A2 



Table 2: 



Inhibition results for molecules s ynthesised. The K, values are in micromolar. Where values are in parentheses, 
only the crude compounds were tested. Generally, activity increased by between 3 and 1 0 times when pure samples 
were used. Where no value is given, the molecules were not active. The errors in the calculation of the Kj for purrt.ed 
compounds are less than 10%. 



Compound 



Thrombin 



Trypsin 



Compound 



Thrombin 



Trypsin 



10 



15 



20 



25 



30 



35 



A1B1 

A1B3 
A2B1 
A2B3 
A3B1 
A3B3 
A4B1 
A4B3 



0.56 
(30) 
0.12 
2.8 
0.041 
1.3 
1.4 
(50) 



0.95 
(10) 
0.25 
8.8 
0.60 
58 
1.9 
(13) 



A1B2 
A1B4 
A2B2 
A2B4 
A3B2 
A3B4 
A4B2 
A4B4 



(20) 
(100) 
0.83 
(50) 
0.30 
(30) 
(22) 
(200) 



(0.6) 

(-) 
0.19 

(-) 
0.95 

(-) 
(0.6) 

(-) 



A2B5 

A2B7 

A2B9 

A2B11 

A2B13 

A2B15 



(10) 
0.71 
0.69 



(40) 
58 
1.5 



A2B6 

A2B8 

A2B10 

A2B12 

A2B14 



(90) 
(20) 
4.0 

48 



(-) 
(6) 
590 



A5B1 
A851 
A10B1 
A13B1 



2.9 
1.6 

(-) 
(200) 



0.12 
0.67 
(9) 
(90) 



A7B1 
A9B1 
A12B1 



0.28 
0.53 
(-) 



1.0 
0.52 
(D 



40 



45 



50 



55 



At the Phe position, the best scoring substiluents were aromatic D-amino acids, which reflects the strict 3D con- 
straints imposed by the thrombin active site and the need to form good hydrophobic contacts .f h.gh affinity >j , tp be 
achieved The process of the invention did produce non-amino acid solutions but these scored poorly. The best sub^ 
stituent was the P -Br-D-Phe (A3) which is three times more active than the simple Phe derivative. The available start ng 
materia was a racemic mixture and the resulting diastereoisomers were separated by HPLC As predated, one of the 
diastereoisomers was at least 100 times less actK,e than the other. Of particu.ar interest are the ^ M ^L ^rTse 
functionality (A4, A5 and A7) which have not been thoroughly explored before ,n PPACK analogues. These we e se- 
lected because they were predicted to form additional hydrogen bonds, which if not contributing to affinity, could en- 
hance selectivity. The poor activity of the sulfonamide derivatives against thrombin was not ^.cularly surpnsuig 
Snce the design criteria for this substituent list omitted a hydrogen bond to G.y-216 Despite th,s drawback the syn- 
theses were justified because the sulphonamides increased the chemical diversity and allowed the exploration of dif- 
ferent modes for the hydrophobic pocket, ovrollent 

At the Arg position, the most active base is agmatine. as expected, since the guan.dino group can make excel ent 
contacts with Asp-189 and Gly-218 at the bottom of the S1 subsite. However there is great incentive to diverge from 
STSSSSl^SSf because of As pharmacokinetk: properties and side-effect profile. Di-amino pentane (B9) ,s 
active as would be expected for a lysine analogue (see Brady, Bio Med. Chem 8:1063 (1995)). The other b,s-am,nes 
also have respectab.e activity, which is of interest because good hydrophobic contacts in ^ Pf ^ 1 ™* .ncrease 
affinity and selectivity (see Deadman, J. Med Chem 38. 1 511 0 995)). The activity of the short aml.ne (B7 is particularly 
Seresttng It s unlikely that this substituent is long enough to interact directly with Asp-189 (although there could be 
Si Molecule), instead it is predicted to form hydrogen bonds to G.v-219 and Ala-190. * 
of this compound which caused us to explore different anilines using the functional group Xt ^^^^^ s 
Viewed from a second aspect the invention also provides novel active compounds identified by the process of this 
invention. Thus the compounds for which non-parenthesized activity values are given in Table 2 above , are . deemed I to 
fall within the scope of the invention as are all other active PPACK analogs incorporating the successful substituents 
that characterise reagents A3, A4, A5, A7 and B7. 
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Claims 

1 . A process for drug candidate identification, said process comprising the steps of: 

Z^:^^^ PreSenm ^ ° f st-ture of a binding site on the surface 

(2) generating a computerised model of the functional structure of said binding site which may be used to 
identify favourable and unfavourable interactions between the binding site and a drug candidal LoSe 

Sif™ yjn K ? , m °'f CU,ar ,ra 9 ment ca^e of Placement within said binding site and capable of carrying at 
least one substrtuent group, said molecular fragment either being capab.e of being synthesized^ reZn 
compounds accessible in substituted form whereby to import said substituent gLps on syntheTs of said 
molecular fragment or be.ng present in an accessible reagent compound capable of substftmion with 
substituent groups by reaction with further accessible reagent compounds; substrtut™ *aid 

nlfnZTT 9 f 01 N ! tS ° f 3CCessib,e rea 9^t compounds, the lists being such that a combination of com- 

EE? ° m 6 ? ' iSt ^ be ,0 Pf0dUCe 3 Candida,e com P°^ comprising So moTecuter 

fragment carrymg a plurality of substituent groups thereby generating a first virtual library of candidate com 

r:Sr reach 6 , t ? eorelical set of compounds producib,e by reac '° n ° f ihe ° s; 

X w^hinThat'St; ° mP " S,n9 3 ^ ,0 ^ " < ha « and a -mponen" 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to 
generate a restncted second virtual library of candidate compounds, the operation of said firsTslt of mils 
•nvoMng for each member of each ,ist computerised comparison for favourable or unVavolbTe inlac iJns 
between said computerised mode, and a structure comprising said molecular fragment and a subs^tuen de 

mode h? I U ,H ,qU : T POnent ^ Said ' iSt ° f ,hat member " ,he -o.ecu.ar fragmen and the com^eled 
model be,ng held .n fixed spatial relationship to each other for said comparison; computer.sed 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfa 
vourable .nteract.ons with said computerised model and thereby generating a restricted t^Ztl^Tot 
candidate compounds ranked as having favourable interactions; ^ 

ZTwIn^u^ T **? VirtUal Nbrary * ' eaSt ° ne ,Urther m ° ,ecular fra 9 ment and Wing 
steps (4), (5) and (6) to generate an alternative third virtual library; 

fnL^T 1 ?, ! aid VirtUa ' Mbrary USing a second set of exclusion ^ hereby to generate a restricted 

p^rbraT 9 S ° me ^ a " ° andida,e COrnP0UndS ° f ,OUrth Virtual ,ibrar V to P^ a a candidate com- 
(10) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 
01) analysing the experimental efficacy data generated in step (10) for structure-activity relationship informa- 
lly Ssbein^v^ 3 ^ f eriV fi n St6P ° 1 > Se ' eCting 3 r6ViSed Se ' ° f Ms,S of accessible '* a gen, compounds, 
said hate be ing expanded to .nclude selected reagents not present in the restricted lists generated in step (5 
and optionally restncted to exclude selected reagents present in the restricted lists generated in stLp (5); 

!miT ati , n9 , S,eP f (6 ! and <7) l ° id6ntify ,Urther com P°^s which are candidates for synthesis and exper- 
imental evaluation for drug efficacy; «*per 

(14) synthesising and experimentally evaluating said further compounds for drug efficacy; 
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(15) if required repeating steps (11) to (14) one or more times; 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above. 

s 2 A process according to claim 1 wherein in step (7) at least one further molecular fragment is selected from the 
thiT^ steps (4), (5) and (6) are repeated to generate an alternative third v.rtua. library 

which is subsequently screened in step (8). 

3 A orocess according to either of claims 1 and 2 wherein in step (12) said revised set of lists of accessible reagents 
10 t seated To HSZ reagents exc.uded from the restricted lists generated in step (5) for beinganalogs of reagents 

included in said restricted lists. 

4 A process according to any one of claims 1 to 3 wherein in step (1 2) said revised set of lists of accessible reagents 

reagents exc.uded from the restricted lists generated in step (5) for involving complex trans- 
15 formation in their synthesis from commercially available reagents. 

A process according to any one of claims 1 to 4 wherein in step (12) said revised set of lists of accessible reagents 
fs selected include, reagents exc.uded from the restricted lists generated in step (5) for be.ng produced .n low 
yield in their synthesis from commercially available reagents. 

A process according to any one of claims 1 to 5 wherein in step (12) said revised set of lists of accessible reagenls 
fs seTeSLd I £ include reagents excluded from the restricted lists generated in step (5) for requmng s.gn,f,cant 
purification following their synthesis from commercially available reagents. 

7 A process according to any one of claims 1 to 6 wherein in step (1 2) said revised set of lists of accessible reagents 
' is selected to include reagents excluded from the restricted lists generated in step (5) for be.ng expens.ve. 

8. A process according to any one of claims 1 to 7 wherein in step (1) said representation is derived from X-ray 
crystallographic data for said macromolecule. 

9. A process according to any one of claims 1 lo 8 wherein in step (4) said lists of accessible reagents are generated 
from a computer database of available chemicals. 

10 A process according to claim 9 wherein in step (4) said lists of accessible reagents are supplemented to include 
35 reagents accessible by transformation ol reagents identified from said database. 

11 A process according to any one of claims 1 to 10 wherein the model generated in step (2) comprises a ^presen- 
' TaE thosemgions of the binding site capable of interaction with a molecule placed in said b.nd.ng s.te the sa,d 

regions being identified according to the nature and geometry of said interaction. 

12 A orocess according to any one of claims 1 to 11 wherein in step (5) said computerised comparison for a reagent 
fnvTes fn s^Vence: (i) carrying out a subgraph isomorphism check to estab.ish a match between sad unique 
component of said reagent and said computerised model, (ii) rejecting reagents for wh.ch no match can be found, 
(STrS X matter non-rejected reagents by torsional optimization of the rotatab.e bonds ,n the unique 
client V) calculating the compatibly between the computerised model and a structure comprising he 
^STl'iiem and the substituent deriving from the unique component of the reagent — on 
predicted by step (iii), (v) optionalty repeating steps (iii) and (iv) to seek a conformat.on w,th enha ^ £ 
(vi) rejecting reagents for which a preselected degree of compatibility is not found ,n steps (iv) and (v), <v„) de ter 
mining a score indicative of a minimum energy level for said structure within said computerised mode, wrth rthe 
sTucture and position of said mo.ecu.ar fragment hekf constant, and (viii) ranking the reagents .n a hst according 
to the scores determined in step (vii). 

13 A process according to claim 12 wherein in step (vii) scores indicative ol strain energy and contributions to energy 
Tev "nndlidua. interactions of components of said structure with said computerised model are a - de nned 
and reagents are rejected if such scores exceed pre-selected limits indicative of undesirable conformat.on or ,n- 
teraction. 

14. Novel active compounds identified by a process according to any one of claims 1 to 1 3. 
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16. A m.lho) <a manufacturing a drug .ubaanc, said m.lhod comp.faing fa. steps M: 

u !atSc 9 a?:S^,r eS * n,a " 0 " °' ' hS *— — — «— - a »«„g .„ on fae surtac. 

from c ^ pounds ' the ,is,s being such that a combination °< — 

(7) optionally, selecting from said third virtual librarv at lea<st nno further i~ • < 

steps (4), (5) and (6) ,o generate an ane^Z^t^l^- *" 

SrS?S-? id Vi ? a ' Mbrary USin9 3 S6COnd S6t ° f exC,usion ru,es ther eby to generate a restricted 

pound !i h bS! n9 S ° me ° f ^ C8ndida,e C ° mPOUndS ° f f ° Urth VirtUa ' ' ibrafy l ° Pr0duce a c - didate 
(1 0) experimentally evaluating the compounds of said candidate compound library for drug efficacy; 
01) analysing the experimental efficacy data generated in step (10) for struc,ure-acti Vl ty relationship informa- 



35 



40 



45 



112 Ssbl'atoandS T h" 01 } Se ' eCtin9 3 feViSed 561 °' ,iS,S °' accessibI * '-agent compounds 

im^es 

(14) synthesising and expenmentally evaluating said further compounds for drug efficacy; 

(15) if required repeating steps (11) to (14) one or more times 
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(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above; 

(17) manufacturing the compound identified in step (16) above; and, optionally, 

(18) admixing the compound manufactured in step (17) above with at least one pharmaceutical^ acceptable 
carrier or excipient. 
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